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GENERAL INFORMATION 

(i) APPLICANT: Koopman, Peter 

Goodfellow, Peter 

(ii) TITLE OF THE INVENTION: SOX -9 GENE AND PROTEIN AND 

USE IN THE REGENERATION OF BONE OR CARTILAGE 

(iii) NUMBER OF SEQUENCES: 21 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Scully, Scott, Murphy & Presser 

(B) STREET: 400 Garden City Plaza 

(C) CITY: Garden City 

(D) STATE: NY 

(E) COUNTRY: U.S.A. 

(F) ZIP: 11530 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette 

(B) COMPUTER: IBM Compatible 

(C) OPERATING SYSTEM: DOS 

.(D) SOFTWARE: FastSEQ Version 1.5 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/860,635 

(B) FILING DATE: 29 -MAY- 1997 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: AU PM9714 

(B) FILING DATE: 29 -NOV- 1994 

(A) APPLICATION NUMBER: AU PM9 83 5 

(B) FILING DATE: 05 -DEC- 1994 

(A) APPLICATION NUMBER: PCT/AU95/00799 

(B) FILING DATE: 29 -NOV- 1995 

(viii) ATTORNEY/ AGENT INFORMATION: 

(A) NAME: DiGiglio, Frank S. 

(B) REGISTRATION NUMBER: 31,346 

(C) REFERENCE/DOCKET NUMBER: 109 81 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 516-742-4343 

(B) TELEFAX: 516-742-4366 

(C) TELEX: 



(2) INFORMATION FOR SEQ ID NO:l: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
AATTAAA 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
□ (D) TOPOLOGY: linear 

,H (ii) MOLECULE TYPE: cDNA 

5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO 

&AAAGTCCT AAAGGTGGG 



(Jy INFORMATION FOR SEQ ID NO: 3: 

!H (i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 19 base pairs 
?f (B) TYPE: nucleic acid 

C3 (C) STRANDEDNESS: single 

M (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 

TTTCAGGCAA ATAAGGCAG 



(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
TGGCAATCTA ACAGATGAGA 



(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
TCNCAAATGT CATATATCCA 



(2) INFORMATION FOR SEQ ID NO: 6: 

%y (i) SEQUENCE CHARACTERISTICS: 
%□ (A) LENGTH: 22 base pairs 

H (B) TYPE: nucleic acid 

b (C) STRANDEDNESS: single 

□ (D) TOPOLOGY: linear 

%4 (ii) MOLECULE TYPE: cDNA 

J 3 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 

i£pT C C AG AT T GACTGGAACA CA 

{% INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7 
GCAATAAGAT ACTAATATGT AGAG 



(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
GTCAGCAGAA ATCCTAAAGG 



(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
a (D) TOPOLOGY: linear 

v3 (ii) MOLECULE TYPE: cDNA 

3 - 

□ (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9 
CgACTAATGCC GATGGTTAAG 



INFORMATION FOR SEQ ID NO: 10: 

«i (i) SEQUENCE CHARACTERISTICS: 
% (A) LENGTH: 2 0 base pairs 

~:z (B) TYPE: nucleic acid 

p (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 

CGCCTCGAGG TGGCTTATCG 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 5 base pairs 

(B) TYPE:, nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 



ATCATACACA TACGATTTAG GTGAC 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 
GAGGAAGTCG GTGAAGAAC 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 
Ij (A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
M= (D) TOPOLOGY: linear 

□ ( i i ) MOLECULE TYPE : cDNA 

hj (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 
kcGCTCATGC CGGAGGAGGA G 

(Ji) INFORMATION FOR SEQ ID NO: 14: 

J 3 (i) SEQUENCE CHARACTERISTICS: 
^ (A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l 
GCAATCCCAG GGCCCACCGA C 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15 
TTGGAGATGA CGTCGACTGC TC 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
i (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 
1 GCAGCGACGT CATCTCCAAC 



j(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l 
GCTGCTTGGA CATCCACACG T 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2249 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 



• 



AGTTTCAGTC CAGGAACTTT TCTTTGCAAG AGAGACGAGG TGCAAGTGGC 5 0 

CCCGGTTTCG TTCTCTGTTT TCCCTCCCTC CTCCTCCGCT CCGACTCGCC 100 

TTCCCCGGGT TTAGAGCCGG CAGCTGAGAC CCGCCACCCA GCGCCTCTGC 15 0 

TAAGTGCCCG CCGCCGCAGC CCGGTGACGC GCCAACCTCC CCGGGAGCCG 2 00 

TTCGCTCGGC GTCCGCGTCC GGGCAGCTGA GGGAAGAGGA GCCCCAGCCG 2 50 

CCGCGGCTTC TCGCCTTTCC CGGCCACCCG CCCCCTGCCC CGGGCTCGCG 3 00 

TATGAATCTC CTGGACCCCT TCATGAAGAT G AC C G AC GAG CAGGAGAAGG 3 50 

GCCTGTCTGG CGCCCCCAGC CCCACCATGT CGGAGGACTC GGCTGGTTCG 4 00 

CCCTGTCCCT CGGGCTCCGG CTCGGACACG G AG AAC AC C C GGCCCCAGGA 4 50 

GAACACCTTC CCCAAGGGCG AGCCGGATCT GAAGAAGGAG AGC GAGGAAG 500 

« ATAAGTTCCC CGTGTGCATC CGCGAGGCGG TCAGCCAGGT GCTGAAGGGC 550 

% T AC G AC T GG A CGCTGGTGCC CATGCCCGTG CGCGTCAACG GCTCCAGCAA 600 

\t z GAACAAGCCA CACGTCAAGC GACCCATGAA CGCCTTCATG GTGTGGGCGC 650 

53 AGGCTGCGCG CAGGAAGCTG GCAGACCAGT ACCCGCATCT GCACAACGCG 7 0 0 

^ GAGCTCAGCA AGACTCTGGG CAAGCTCTGG AGGCTGCTGA AC GAG AGC G A 75 0 

?3 GAAGAGACCC TTCGTGGAGG AGGCGGAGCG GCTGCGCGTG CAGCACAAGA 80 0 

rU AAGACCACCC CGATTACAAG TACCAGCCCC GGCGGAGGAA GTCGGTGAAG 85 0 

pAACGGACAAG CGGAGGCCGA AGAGGCCACG G AAC AG AC TC ACATCTCTCC 9 00 

"~ TAATGCTATC TTCAAGGCGC TGCAAGCCGA CTCCCCACAT TCCTCCTCCG 9 50 

GCATGAGTGA GGTGCACTCC CCGGGCGAGC ACTCTGGGCA ATCTCAGGGT 100 0 

CCGCCGACCC CACCCACCAC TCCCAAAACC GACGTGCAAG CTGGCAAAGT 105 0 

TGATCTGAAG C G AG AGGGGC GCCCTCTGGC AGAGGGGGGC AGACAGCCCC 1100 

CCATCGACTT CCGCGACGTG GACATCGGTG AACTGAGCAG CGACGTCATC 115 0 

TCCAACATTG AGACCTTCGA CGTCAATGAG TTTGACCAAT ACTTGCCACC 12 0 0 

CAACGGCCAC CCAGGGGTTC CGGCCACCCA CGGCCAGGTC ACCTACACTG 12 5 0 

GCAGTTACGG CATCAGCAGC ACCGCACCCA CCCCTGCGAC CGCGGGCCAC 13 0 0 

GTGTGGATGT CGAAGCAGCA GGCGCCGCCC CCTCCTCCGC AGCAGCCTCC 13 5 0 

GCAGGCCCCG CAAGCCCCAC AGGCGCCTCC GCAGCAGCAA GCACCCCCGC 14 0 0 



AGCAGCCGCA 


GGCACCCCAG 


CAGCAGCAGG 


CACACACGCT 


CACCACGCTG 


1450 


AGCAGCGAGC 


CAGGCCAGTC 


CCAGCGAACG 


CACATCAAGA 


CGGAGCAGCT 


1500 


GAGCCCCAGC 


CACTACAGGG 


AGCAGCAGCA 


GCACTCCCCG 


CAACAGATCT 


1550 


CCTACAGCCC 


CTTCAACCTT 


CCTCACTACA 


GGCCCTCCTA 


CCCGCCCATC 


1600 


ACCCGTTCGG 


AATACGACTA 


CGCTGACCAT 


CAGAACTCCG 


GCTCCTACTA 


1650 


CAGTCACGCA 


GCCGGCCAGG 


GCTCAGGGCT 


CTACTCCACC 


TTCACTTACA 


1700 


TGAACCCCGC 


GCAGCGCCCC 


ATGTACACCC 


CCATCGGTGA 


CACCTCCGGG 


1750 


GTCCCTTCCA 


TCCCGCAGAC 


CCACAGCCCG 


CAGGACTGGG 


AACAACCAGT 


1800 


CTACACACAG 


GTCACCAGAC 


CCTGAGAAGA 


GAAAAGC TAT 


GGTGACAGAG 


1850 


CTGATCTTTT 


TTTTTTTTTT 


TTTTTAAAGA 


AGAAAAGAAA 


GAAACGAAAA 


1900 


q AGAAAAAGCT 


GAAGGAAATC 


AAGAACCAAT 


TGAAATTCCT 


TTGGACACTT 


1950 


TTTTTTTTGT 


CCTTTCGTTA 


ATTTTTAAAA 


GACATGTAAA 


GGAAGGTAAC 


2000 


GATTGCTGGG 


CATTCCAGGA 


GAGAGACTTT 


AAGACTTTGT 


CTGAGCTCAT 


2050 


id GACAACATAT 

iy 


TGCAAATGGC 


CGGGCCACTC 


GTGGCCAGAC 


GGACAGCACT 


2100 


N CCTGGCCAGA 


TGGACCCACC 


AGTATCAGCG 


AGGAGGGGCT 


TGTCTCCTTC 


2150 


5^ AGAGTTAACA 


TGGAGGACGA 


TTGGAGAATC 


TCCCTGCCTG 


TTTGGACTTT 


2200 


GTAATTATTT 


TTTAGCCGTA 


ATTAAAGAAA 


AAAAAAGTCC 


AAAAAAAAA 


2249 



M<2) INFORMATION FOR SEQ ID NO:19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 507 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

Met Asn Leu Leu Asp Pro Phe Met Lys Met Thr Asp Glu Gin Glu Lys 
15 10 15 

Gly Leu Ser Gly Ala Pro Ser Pro Thr Met Ser Glu Asp Ser Ala Gly 
20 25 30 

Ser Pro Cys Pro Ser Gly Ser Gly Ser Asp Thr Glu Asn Thr Arg Pro 
35 40 45 



Gin Glu Asn Thr Phe Pro Lys Gly Glu Pro Asp Leu Lys Lys Glu Ser 
50 55 60 

Glu Glu Asp Lys Phe Pro Val Cys lie Arg Glu Ala Val Ser Gin Val 
65 70 75 80 

Leu Lys Gly Tyr Asp Trp Thr Leu Val Pro Met Pro Val Arg Val Asn 

85 90 95 

Gly Ser Ser Lys Asn Lys Pro His Val Lys Arg Pro Met Asn Ala Phe 
100 105 110 

Met Val Trp Ala Gin Ala Ala Arg Arg Lys Leu Ala Asp Gin Tyr Pro 
115 120 125 

His Leu His Asn Ala Glu Leu Ser Lys Thr Leu Gly Lys Leu Trp Arg 
130 135 140 

Leu Leu Asn Glu Ser Glu Lys Arg Pro Phe Val Glu Glu Ala Glu Arg 
145 150 155 160 

Leu Arg Val Gin His Lys Lys Asp His Pro Asp Tyr Lys Tyr Gin Pro 

165 170 175 . 

Arg Arg Arg Lys Ser Val Lys Asn Gly Gin Ala Glu Ala Glu Glu Ala 
180 185 190 

Thr Glu Gin Thr His lie Ser Pro Asn Ala lie Phe Lys Ala Leu Gin 
195 200 205 

Ala Asp Ser Pro His Ser Ser Ser Gly Met Ser Glu Val His Ser Pro 
210 215 220 

Gly Glu His Ser Gly Gin Ser Gin Gly Pro Pro Thr Pro Pro Thr Thr 
225 230 235 240 

Pro Lys Thr Asp Val Gin Ala Gly Lys Val Asp Leu Lys Arg Glu Gly 

245 250 255 

Arg Pro Leu Ala Glu Gly Gly Arg Gin Pro Pro lie Asp Phe Arg Asp 
260 265 270 

Val Asp lie Gly Glu Leu Ser Ser Asp Val lie Ser Asn lie Glu Thr 
275 280 285 

Phe Asp Val Asn Glu Phe Asp Gin Tyr Leu Pro Pro Asn Gly His Pro 
290 295 300 

Gly Val Pro Ala Thr His Gly Gin Val Thr Tyr Thr Gly Ser Tyr Gly 
305 310 315 320 

lie Ser Ser Thr Ala Pro Thr Pro Ala Thr Ala Gly His Val Trp Met 

325 330 335 

Ser Lys Gin Gin Ala Pro Pro Pro Pro Pro Gin Gin Pro Pro Gin Ala 
340 345 350 




Pro Gin Ala Pro Gin Ala Pro Pro Gin Gin Gin Ala Pro Pro Gin Gin 
355 360 365 

Pro Gin Ala Pro Gin Gin Gin Gin Ala His Thr Leu Thr Thr Leu Ser 
370 375 380 

Ser Glu Pro Gly Gin Ser Gin Arg Thr His lie Lys Thr Glu Gin Leu 
385 390 395 400 

Ser Pro Ser His Tyr Arg Glu Gin Gin Gin His Ser Pro Gin Gin lie 

405 ,410 415 

Ser Tyr Ser Pro Phe Asn Leu Pro His Tyr Arg Pro Ser Tyr Pro Pro 
420 425 430 

lie Thr Arg Ser Glu Tyr Asp Tyr Ala Asp His Gin Asn Ser Gly Ser 
435 440 445 

Tyr Tyr Ser His Ala Ala Gly Gin Gly Ser Gly Leu Tyr Ser Thr Phe 
450 455 460 

Thr Tyr Met Asn Pro Ala Gin Arg Pro Met Tyr Thr Pro lie Gly Asp 
465 470 475 480 

Thr Ser Gly Val Pro Ser lie Pro Gin Thr His Ser Pro Gin Asp Trp 

485 490 495 

Glu Gin Pro Val Tyr Thr Gin Val Thr Arg Pro 
500 505 

(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3923 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

CGGAGCTCGA AACTGACTGG AAACTTCAGT GGCGCGGAGA CTCGCCAGTT TCAACCCCGG 6 0 

AAACTTTTCT TTGCAGGAGG AGAAGAGAAG GGGTGCAAGC GCCCCCACTT TTGCTCTTTT 120 

TCCTCCCCTC CTCCTCCTCT CCAATTCGCC TCCCCCCACT TGGAGCGGGC AGCTGTGAAC 180 

TGGCCACCCC GCGCCTTCCT AAGTGCTCGC CGCGGTAGCC GGCCGACGCG CCAGCTTCCC 24 0 

CGGGAGCCGC TTGCTCCGCA TCCGGGCAGC C G AGGGG AG A GGAGCCCGCG CCTCGAGTCC 3 00 

CCGAGCCGCC GCGGCTTCTC GCCTTTCCCG GCCACCAGCC CCCTGCCCCG GGCCCGCGTA 3 60 

TGAATCTCCT GGACCCCTTC ATGAAGATGA CCGACGAGCA GGAGAAGGGC CTGTCCGGCG 42 0 




CCCCCAGCCC CACCATGTCC GAGGACTCCG CGGGCTCGCC CTGCCCGTCG GGCTCCGGCT 480 

CGGACACCGA GAACACGCGG CCCCAGGAGA ACACGTTCCC C AAGGGC GAG CCCGATCTGA 540 

AGAAGGAGAG CGAGGAGGAC AAGTTCCCCG TGTGCATCCG CGAGGCGGTC AGCCAGGTGC 600 

TCAAAGGCTA CGACTGGACG CTGGTGCCCA TGCCGGTGCG CGTCAACGGC TCCAGCAAGA 660 

ACAAGCCGCA CGTCAAGCGG CCCATGAACG CCTTCATGGT GTGGGCGCAG GCGGCGCGCA 720 

GGAAGCTCGC GGACCAGTAC CCGCACTTGC ACAACGCCGA GCTCAGCAAG ACGCTGGGCA 7 80 

AGCTCTGGAG ACTTCTGAAC G AG AGC GAGA AGCGGCCCTT CGTGGAGGAG GCGGAGCGGC 840 

TGCGCGTGCA GCACAAGAAG GACCACCCGG ATTACAAGTA CCAGCCGCGG CGGAGGAAGT 9 00 

CGGTGAAGAA CGGGCAGGCG GAGGCAGAGG AGGCCACGGA GCAGACGCAC ATCTCCCCCA 960 

ACGCCATCTT CAAGGCGCTG CAGGCCGACT CGCCACACTC CTCCTCCGGC ATGAGCGAGG 1020 

n TGCACTCCCC CGGCGAGCAC TCGGGGCAAT CCCAGGGCCC ACCGACCCCA CCCACCACCC 1080 

^jf CCAAAACCGA CGTGCAGCCG GGCAAGGCTG ACCTGAAGCG AGAGGGGCGC CCCTTGCCAG 1140 

%u 

° f t AGGGGGGCAG ACAGCCCCCT ATCGACTTCC GCGACGTGGA CATC GGC GAG CTGAGCAGCG 1200 

ST —3 
W 

53 ACGTCATCTC CAACATCGAG ACCTTCGATG TCAACGAGTT TGACCAGTAC CTGCCGCCCA 1260 

^ACGGCCACCC GGGGGTGCCG GCCACGCACG GCCAGGTCAC CT AC AC GGGC AGCTACGGCA 132 0 

S3 TCAGCAGCAC CGCGGCCACC CCGGCGAGCG CGGGCCACGT GTGGATGTCC AAGCAGCAGG 1380 

fUCGCCGCCGCC ACCCCCGCAG CAGCCCCCAC AGGCCCCGCC GGCCCCGCAG GCGCCCCCGC 144 0 

OAGCCGCAGGC GGCGCCCCCA CAGCAGCCGG CGGCACCCCC GCAGCAGCCA CAGGCGCACA 1500 

CGCTGACCAC GCTGAGCAGC GAGCCGGGCC AGTCCCAGCG AACGCACATC AAG AC GG AGC 1560 

AGCTGAGCCC CAGCCACTAC AGC GAGC AGC AGC AGC AC T C GCCCCAACAG ATCGCCTACA 1620 

GCCCCTTCAA CCTCCCACAC TACAGCCCCT CCTACCCGCC CATCACCCGC TCACAGTACG 1680 

AC T AC AC C G A CCACCAGAAC TCCAGCTCCT ACTACAGCCA CGCGGCAGGC C AGGGC AC C G 174 0 

GCCTCTACTC CACCTTCACC TACATGAACC CCGCTCAGCG CCCCATGTAC ACCCCCATCG 1800 

CCGACACCTC TGGGGTCCCT TCCATCCCGC AGACCCACAG CCCCCAGCAC TGGGAACAAC 186 0 

CCGTCTACAC ACAGCTCACT CGACCTTGAG GAGGCCTCCC ACGAAGGGCG ACGATGGCCG 192 0 

AGATGATCCT AAAAATAACC GAAGAAAGAG* AGGACCAACC AGAATTCCCT TTGGACATTT 19 80 

GTGTTTTTTT GTTTTTTTAT TTTGTTTTGT TTTTTCTTCT TCTTCTTCTT CCTTAAAGAC 204 0 

ATTTAAGCTA AAGGCAACTC GTACCCAAAT TTCCAAGACA CAAACATGAC CTATCCAAGC 210 0 




GCATTACCCA CTTGTGGCCA ATCAGTGGCC AGGCCAACCT TGGCTAAATG GAGCAGCGAA 2160 

ATCAACGAGA AACTGGACTT TTTAAACCCT CTTCAGAGCA AGCGTGGAGG ATGATGGAGA 2220 

ATCGTGTGAT CAGTGTGCTA AATCTCTCTG CCTGTTTGGA CTTTGTAATT ATTTTTTTAG 22 80 

CAGTAATTAA AGAAAAAAGT CCTCTGTGAG GAATATTCTC TATTTTAAAT ATTTTTAGTA 2340 

TGTACTGTGT ATGATTCATT ACCATTTTGA GGGGATTTAT ACATATTTTT AGATAAAATT 24 00 

AAATGCTCTT ATTTTTCCAA CAGCTAAACT ACTCTTAGTT GAACAGTGTG CCCTAGCTTT 2460 

TCTTGCAACC AGAGTATTTT TGTACAGATT TGCTTTCTCT TACAAAAAGA AAAAAAAAAT 252 0 

CCTGTTGTAT TAACATTTAA AAACAGAATT GTGTTATGTG ATCAGTTTTG GGGGTTAACT 2 580 

TTGCTTAATT CCTCAGGCTT TGCGATTTAA GGAGGAGCTG CCTTAAAAAA AAATAAAGGC 2 640 

CTTATTTTGC AATTATGGGA GTAAACAATA GTCTAGAGAA GCATTTGGTA AGCTTTATGA 2700 

TATATATATT TTTTAAAGAA GAGAAAAACA CCTTGAGCCT TAAAACGGTG CTGCTGGGAA 2760 

ACATTTGCAC TCTTTTAGTG CATTTCCTCC TGCCTTTGCT TGTTCACTGC AGTCTTAAGA 2 820 

AAGAGGTAAA AGGCAAGCAA AGGAGATGAA ATCTGTTCTG GGAATGTTTC AGCAGCCAAT 2 880 

AAGTGCCCGA GCACACTGCC CCCGGTTGCC TGCCTGGGCC CCATGTGGAA GGCAGATGCC 294 0 

TGCTCGCTCT GTCACCTGTG CCTCTCAGAA CACCAGCAGT TAACCTTCAA GACATTCCAC 3 000 

TTGCTAAAAT TATTTATTTT GTAAGGAGAG GTTTTAATTA AAACAAAAAA AAATTCTTTT 3 060 

TTTTTTTTTT TTTTCCAATT TTACCTTCTT TAAAATAGGT TGTTGGAGCT TTCCTCAAAG 3120 

GGTATGGTCA TCTGTTGTTA AATTATGTTC TTAACTGTAA CCAGTTTTTT TTTATTTATC 3180 

TCTTTAATCT TTTTTATTAT TAAAAGCAAG TTTCTTTGTA TTCCTCACCC TAGATTTGTA 324 0 

TAAATGCCTT TTTGTCCATC CCTTTTTTCT TTGTTGTTTT TGTTGAAAAC AAACTGGAAA 3 3 00 

CTTGTTTCTT TTTTTGTATA AATGAGAGAT TGCAAATGTA GTGTATCACT GAGTCATTTG 33 60 

CAGTGTTTTC TGCCACAGAC CTTTGGGCTG CCTTATATTG TGTGTGTGTG TGGGTGTGTG 342 0 

TGTGTTTTGA CACAAAAACA ATGCAAGCAT GTGTCATCCA TATTTCTCTA CATCTTCTCT 3480 

TGGAGTGAGG GAGGC T AC C T GGAGGGGATC AGCCCACTGA C AG AC C TT AA TCTTAATTAC 3 54 0 

TGCTGTGGCT AGAGAGTTTG AGGATTGCTT TTTAAAAAAG ACAGCAAACT TTTTTTTTTA 3 600 

TTTAAAAAAA GATATATTAA CAGTTTTAGA AGTCAGTAGA ATAAAATCTT AAAGCACTCA 3 660 

TAATATGGCA TCCTTCAATT TCTGTATAAA AGCAGATCTT TTTAAAAAAG ATACTTCTGT 3 720 

AACTTAAGAA ACCTGGCATT TAAATCATAT TTTGTCTTTA GGTAAAAGCT TTGGTTTGTG 3 7 80 




TTCGTGTTTT GTTTGTTTCA CTTGTTTCCC TCCCAGCCCC AAACCTTTTG TTCTCTCCGT 3 84 0 
GAAACTTACC TTTCCCTTTT TCTTTCTCTT TTTTTTTTTG TATATTATTG TTTACAATAA 39 0 0 
ATATACATTG CATTAAAAAG AAA 39 23 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 09 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 





Met 


Asn 


Leu 


Leu 


Asp 


Pro 


Phe 


Met 


Lys 


Met 


Thr 


Asp 


Glu 


Gin 


Glu 


Lys 




1 








5 










10 










15 






Gly 


Leu 


Ser 


Gly 


Ala 


Pro 


Ser 


Pro 


Thr 


Met 


Ser 


Glu 


Asp 


Ser 


Ala 


Gly 


%~ 








20 










25 










30 






J: sir 


Ser 


Pro 


Cys 


Pro 


Ser 


Gly 


Ser 


Gly 


Ser 


Asp 


Thr 


Glu 


Asn 


Thr 


Arg 


Pro 








35 










40 










45 








Cs 


Gin 


Glu 


Asn 


Thr 


Phe 


Pro 


Lys 


Gly 


Glu 


Pro 


Asp 


Leu 


Lys 


Lys 


Glu 


Ser 


CO 




50 










55 










60 












Glu 


Glu 


Asp 


Lys 


Phe 


Pro 


Val 


Cys 


He 


Arg 


Glu 


Ala 


Val 


Ser 


Gin 


Val 


— 


65 










70 










75 










80 




Leu 


Lys 


Gly 


Tyr 


Asp 


Trp 


Thr 


Leu 


Val 


Pro 


Met 


Pro 


Val 


Arg 


Val 


Asn 












85 










90 










95 




f M 


Gly 


Ser 


Ser 


Lys 


Asn 


Lys 


Pro 


His 


Val 


Lys 


Arg 


Pro 


Met 


Asn 


Ala 


Phe 










100 










105 










110 








Met 


Val 


Trp 


Ala 


Gin 


Ala 


Ala 


Arg 


Arg 


Lys 


Leu 


Ala 


Asp 


Gin 


Tyr 


Pro 








115 










120 










125 










His 


Leu 


His 


Asn 


Ala 


Glu 


Leu 


Ser 


Lys 


Thr 


Leu 


Gly 


Lys 


Leu 


Trp 


Arg 






130 










135 










140 












Leu 


Leu 


Asn 


Glu 


Ser 


Glu 


Lys 


Arg 


Pro 


Phe 


Val 


Glu 


Glu 


Ala 


Glu 


Arg 




145 










150 










155 










160 




Leu 


Arg 


Val 


Gin 


His 


Lys 


Lys 


Asp 


His 


Pro 


Asp 


Tyr 


Lys 


Tyr 


Gin 


Pro 












165 










170 










175 






Arg 


Arg 


Arg 


Lys 


Ser 


Val 


Lys 


Asn 


Gly 


Gin 


Ala 


Glu 


Ala 


Glu 


Glu 


Ala 










180 










185 










190 








Thr 


Glu 


Gin 


Thr 


His 


He 


Ser 


Pro 


Asn 


Ala 


He 


Phe 


Lys 


Ala 


Leu 


Gin 








195 










200 










205 










Ala 


Asp 


Ser 


Pro 


His 


Ser 


Ser 


Ser 


Gly 


Met 


Ser 


Glu 


Val 


His 


Ser 


Pro 






210 










215 










220 












Gly 


Glu 


His 


Ser 


Gly 


Gin 


Ser 


Gin 


Gly 


Pro 


Pro 


Thr 


Pro 


Pro 


Thr 


Thr 




225 










230 










235 










240 




Pro 


Lys 


Thr 


Asp 


Val 


Gin 


Pro 


Gly 


Lys 


Ala 


Asp 


Leu 


Lys 


Arg 


Glu 


Gly 












245 










250 










255 






Arg 


Pro 


Leu 


Pro 


Glu 


Gly 


Gly 


Arg 


Gin 


Pro 


Pro 


He 


Asp 


Phe 


Arg 


Asp 










260 










265 










270 








Val 


Asp 


He 


Gly 


Glu 


Leu 


Ser 


Ser 


Asp 


Val 


He 


Ser 


Asn 


He 


Glu 


Thr 






275 










280 










285 









Phe 


Asp 


Val 


Asn 




290 






Gly 


Val 


Pro 


Ala 


305 








He 


Ser 


Ser 


Thr 


Ser 


Lys 


Gin 


Gin 








340 


Pro 


Pro 


Ala 


Pro 






O -J ~J 




Gin 


Pro 


Ala 


Ala 




370 






Leu 


Ser 


Ser 


Glu 


385 








Gin 


Leu 


Ser 


Pro 


Gin 


He 


Ala 


Tyr 








420 


Pro 


Pro 


lie 


Thr 






4 ^ S 




Ser 


Ser 


Tyr 


Tyr 




450 






Thr 


Phe 


Thr 


Tyr 


465 








Ala 


Asp 


Thr 


Ser 


His 


Trp 


Glu 


Gin 








500 



m 



Glu 


Phe 


Asp 


Gin 






£-i *J 




Thr 


His 


Gly 


Gin 




310 






Ala 


Ala 


Thr 


Pro 


325 








Ala 


Pro 


Pro 


Pro 


Gin 


Ala 


Pro 


Pro 








360 


Pro 


Pro 


Gin 


Gin 






7 R 




Pro 


Gly 


Gin 


Ser 




390 






Ser 


His 


Tyr 


Ser 


405 








Ser 


Pro 


Phe 


Asn 


Arg 


Ser 


Gin 


Tyr 








440 


Ser 


His 


Ala 


Ala 






455 




Met 


Asn 


Pro 


Ala 




470 






Gly 


Val 


Pro 


Ser 


485 








Pro 


Val 


Tyr 


Thr 




Tyr 


Leu 


Pro 


Pro 








300 


Val 


Thr 


Tyr 


Thr 






-51 C 

jIj 




Ala 


Ser 


Ala 


Gly 




330 






Pro 


Pro 


Gin 


Gin 


345 








Gin 


Pro 


Gin 


Ala 


Pro 


Gin 


Ala 


His 








380 


Gin 


Arg 


Thr 


His 






O _7 -J 




Glu 


Gin 


Gin 


Gin 




410 






Leu 


Pro 


His 


Tyr 


425 








Asp 


Tyr 


Thr 


Asp 


Gly 


Gin 


Gly 


Thr 








460 


Gin 


Arg 


Pro 


Met 






475 




He 


Pro 


Gin 


Thr 




490 






Gin 


Leu 


Thr 


Arg 


505 









Asn Gly His Pro 

Gly Ser Tyr Gly 
320 

His Val Trp Met 
335 

Pro Pro Gin Ala 
350 

Ala Pro Pro Gin 
365 

Thr Leu Thr Thr 

lie Lys Thr ,Glu 
400 

His Ser Pro Gin 
415 

Ser Pro Ser Tyr 
430 

His Gin Asn Ser 
445 

Gly Leu Tyr Ser 

Tyr Thr Pro He 
480 

His Ser Pro Gin 
495 

Pro 



